perm filename MACHR[4,KMC]3 blob
sn#012461 filedate 1972-11-13 generic text, type T, neo UTF8
00100 COLBY AND MORAVEC
00200
00300
00400 CONTEXT-SENSITIVE FEATURE RECOGNITION FOR COMPUTER UNDERSTANDING OF
00500 TELETYPED NATURAL LANGUAGE DIALOGUES
00600
00700
00800 WHY IS IT SO DIFFICULT FOR MACHINES TO UNDERSTAND NATURAL LANGUAGE?
00900 PERHAPS IT IS BECAUSE MACHINES DO NOT SIMULATE SUFFICIENTLY WHAT
01000 HUMANS DO WHEN HUMANS PROCESS LANGUAGE. SEVERAL YEARS OF EXPERIENCE
01100 WITH COMPUTER SCIENCE AND LINGUISTIC APPROACHES HAVE TAUGHT US THE
01200 SCOPE AND LIMITATIONS OF SYNTACTICAL, SEMANTIC AND CONCEPTUAL
01300 PARSING.[THORNE & BRATLEY] [SIMMONS]
01400 [SCHANK][WILKS][WOODS][WINOGRAD]. WHILE CURRENT PARSERS PERFORM
01500 SATISFACTORILY WITH CAREFULLY EDITED TEXT SENTENCES OR WITH
01600 EXPRESSIONS LIMITED TO A TOY WORLD, THEY ARE UNABLE TO DEAL WITH
01700 EVERYDAY LANGUAUGE BEHAVIOR CHARACTERISTIC OF HUMAN CONVERSATION. IN
01800 AN UNDERSTANDBLY RATIONALISTIC QUEST FOR CERTAINTY AND ATTRACTED BY
01900 AN ANALOGY FROM THE PROOF THEORY OF LOGICIANS IN WHICH PROVABILITY
02000 IMPLIED COMPUTABILITY, COMPUTATIONAL LINGUISTS HOPED TO DEVELOP
02100 CONTEXT-FREE FORMALISMS FOR NATURAL LANGUAGE GRAMMARS. BUT THE HOPE HAS NOT
02200 BEEN REALIZED AND PERHAPS IN PRINCIPLE CANNOT BE. (IT IS DIFFICULT
02300 TO FORMALIZE SOMETHING WHICH CAN HARDLY BE FORMULATED). IN THEIR
02400 DIALOGUES HUMANS ARE NEVER CONTEXT-FREE LINGUISTICALLY OR
02500 CONCEPTUALLY. THE MAIN PROBLEM IS HOW TO MODEL THIS
02600 CONTEXT-SENSITIVITY.
02700
02800 LINGUISTIC PARSERS USE MORPHOGRAPHEMIC ANALYSES , PARTS-OF-SPEECH
02900 ASSIGNMENTS AND DICTIONARIES CONTAINING MULTIPLE WORD-SENSES EACH
03000 POSSESSING SEMANTIC FEATURES FOR RESTRICTING WORD COMBINATIONS. SUCH
03100 PARSERS PERFORM A WORD-BY-WORD ANALYSIS OF EVERY WORD, VALIANTLY
03200 DISAMBIGUATING AT EACH STEP IN AN ATTEMPT TO CONSTRUCT A MEANINGFUL
03300 INTERPRETATION. WHILE IT MAY BE SOPHISTICATED COMPUTATIONALLY, A
03400 LINGUISTIC PARSER IS QUITE USELESS FOR THE UNDERSTANDING OF ORDINARY
03500 CONVERSATION. IN EVERYDAY DISCOURSE PEOPLE SPEAK COLLOQUIALLY AND
03600 IDIOMATICALLY USING ALL SORTS OF PAT PHRASES (`YOU SAID IT'), SLANG
03700 (`LETS RAP') AND CLICHES (`THATS THE WAY IT GOES'). THEY ARE CRYPTIC
03800 AND ELLIPTIC. THEY LACE THEIR EVEN THEIR WRITTEN EXPRESSIONS WITH
03900 MEANINGLESS FUZZ (`WELL NOW LETS SEE') AND FRAGMENTS(`REALLY').THEY
04000 CONVEY THEIR INTENTIONS AND IDEAS IN BOTH IDIOSYNCRATIC AND
04100 METAPHORICAL WAYS, BLITHELY VIOLATING RULES OF 'CORRECT' GRAMMAR AND
04200 SYNTAX. GIVEN THESE DIFFICULTIES, HOW IS IT THAT PEOPLE CARRY ON
04300 CONVERSATIONS EASILY MOST OF THE TIME WHILE MACHINES HAVE FOUND IT
04400 EXTREMELY DIFFICULT TO CONTINUE TO MAKE CONCEPTUALLY APPROPRIATE
04500 REPLIES WHICH COMMUNICATE UNDERSTANDING.
04600
04700
04800
04900 IT SEEMS THAT PEOPLE 'GET THE MESSAGE' WITHOUT ANALYZING EVERY SINGLE
05000 WORD IN THE INPUT AND EVEN IGNORING MANY OF ITS TERMS. PEOPLE MAKE
05100 INDIVIDUALISTIC SELECTIONS FROM HIGHLY REDUNDANT AND REPETITIOUS
05200 COMMUNICATIONS. THESE HIGHLY PERSONAL SELECTIVE OPERATIONS PRODUCE
05300 A TRANSFORMATION OF THE INPUT BY DESTROYING AND EVEN DISTORTING
05400 INFORMATION. IN SPEED READING, FOR EXAMPLE, ONLY A SMALL PERCENTAGE
05500 OF CONTENTIVE WORDS ON EACH PAGE NEED BE LOOKED AT. THESE WORDS
05600 SOMEHOW RESONATE WITH THE READERS RELEVANT CONCEPTUAL-INFERENTIAL
05700 STRUCTURE WHOSE PROCESSES ENABLE HIM TO 'UNDERSTAND' NOT SIMPLY THE
05800 LANGUAGE BUT ALL SORTS OF UNMENTIONED ASPECTS ABOUT THE SITUATIONS
05900 AND EVENTS BEING REFERRED TO IN THE LANGUAGE. IN WRITTEN TEXTS 5/6
06000 OF THE INPUT CAN BE DISTORTED OR DELETED AND THE INTENDED MESSAGE CAN
06100 STILL SUCCESSFULLY BE EXTRACTED. SPOKEN CONVERSATIONS IN ENGLISH ARE
06200 KNOWN TO BE AT LEAST 50% REDUNDANT. HALF THE WORDS CAN BE GARBLED
06300 AND LISTENERS NONETHELESS GET THE GIST OR DRIFT OF WHAT IS BEING
06400 SAID. (GIVE FURTHER EXPERIMENTAL EVIDENCE HERE)
06500
06600 TO APPROXIMATE SUCH HUMAN ACHIEVEMENTS WE REQUIRE A NEW PERSPECTIVE
06700 AND A PRACTICAL METHOD WHICH DIFFERS FROM THAT OF CURRENT LINGUISTIC
06800 PARSING. THIS ALTERNATE APPROACH SHOULD INCORPORATE KNOWLEDGE
06900 GAINED FROM WORK WITH PARSERS BUT SHOULD UTILIZE PRIMARILY
07000 INDIVIDUALISTIC-CONCEPTUAL RATHER THAN GENERAL- GRAMMATICAL FEATURES.
07100 PARSERS REPRESENT COMPLEX AND REFINED ALGORITHMS. WHILE ON ONE HAND
07200 THEY SUBJECT A SENTENCE TO A DETAILED AND SOMETIMES OVERKILLING
07300 ANALYSIS, ON THE OTHER THEY ARE FINICKY AND OVERSENSITIVE. FOR
07400 EXAMPLE, A LINGUISTIC PARSER SIMPLY HALTS IF A WORD IN THE INPUT
07500 SENTENCE IS NOT PRESENT IN ITS DICTIONARY. UNGRAMMATICAL EXPRESSIONS
07600 SUCH AS DOUBLE PREPOSITIONS (`DO YOU WANT TO GET OUT OF FROM THE
07700 HOSPITAL?') ARE QUITE CONFUSING TO THEM. PARSERS CONSTITUTE A
07800 TIGHT CONJUNCTION OF TESTS RATHER THAN A LOOSE DISJUNCTION WHICH PERMITS
07810 PLAUSIBLE GUESSING AND MISUNDERSTANDING. THUS AS MORE AND MORE
07900 TESTS ARE ADDED ,THE PARSER BEHAVES LIKE A FINER AND FINER FILTER AND
08000 IT BECOMES HARDER AND HARDER FOR AN EXPRESSION TO PASS THROUGH IT.
08100
08200 ON INTUITIVE GROUNDS IT IS HARDLY CREDIBLE THAT CONVENTIONAL PARSERS
08300 MODEL THE MECHANISMS PEOPLE USE IN PROCESSING LANGUAGE. AS CHOMSKY[
08400 ] HAS REMARKED, `WE NOTED AT THE OUTSET THAT PERFORMANCE AND
08500 COMPETENCE MUST BE SHARPLY DISTINGUIHED IF EITHER IS TO BE STUDIED
08600 SUCCESSFULLY. WE HAVE NOW DESCRIBED A CERTAIN MODEL OF COMPETENCE. IT
08700 WOULD BE TEMPTING, BUT QUITE ABSURD, TO REGARD IT AS A MODEL OF
08800 PERFORMANCE AS WELL. THUS WE MIGHT PROPOSE THAT TO PRODUCE A
08900 SENTENCE THE SPEAKER GOES THROUGH THE SUCCESSIVE STEPS OF
09000 CONSTRUCTING A BASE-DERIVATION, LINE BY LINE FROM THE INITIAL SYMBOL
09100 S, THEN INSERTING LEXICAL ITEMS AND APPLYING GRAMMATICAL
09200 TRANSFORMATIONS TO FORM A SURFACE STRUCTURE, AND FINALLY APPLYING THE
09300 PHONOLOGICAL RULES IN THEIR GIVEN ORDER, IN ACCORDANCE WITH THE
09400 CYCLIC PRINCIPLE DISCUSSED ABOVE. THERE IS NOT THE SLIGHTEST
09500 JUSTIFICATION FOR ANY SUCH ASSUMPTION.' IT SHOULD BE CLEAR FROM THESE
09600 STRICTURES THAT THE TRANSFORMATIONAL APPROACH HAS BEEN CONCERNED WITH
09700 PRODUCTION RATHER THAN INTERPRETATION OF SENTENCES AND THAT IT IS NOT
09800 ORIENTED TOWARDS HUMAN PERFORMANCE BUT TOWARDS AN IDEALIZED GRAMMAR
09900 OF COMPETENCE.
10000
10100 EARLY ATTEMPTS TO DEVELOP A FEATURE-RECOGNITION APPROACH USING
10200 SPECIAL-PURPOSE HEURISTICS HAVE BEEN DESCRIBED BY COLBY, WATT AND
10300 GILBERT [ ], WEIZENBAUM[ ] AND COLBY AND ENEA[ ]. THE LIMITATIONS OF
10400 THESE ATTEMPTS ARE WELL KNOWN TO WORKERS IN ARTIFICIAL INTELLIGENCE.
10500 SUCH PRIMITIVE CONTEXT-RESTRICTED PROGRAMS OFTEN GRASP A TOPIC WELL
10600 ENOUGH BUT TOO OFTEN DO NOT UNDERSTAND QUITE WHAT IS BEING SAID ABOUT
10700 THE TOPIC, WITH AMUSING OR DISASTROUS CONSEQUENCES. THIS SHORTCOMING
10800 IS BOTH LINGUISTIC AND CONCEPTUAL IN THAT THE FEATURE- RECOGNITION
10900 ABILITIES OF SUCH PROGRAMS ARE RUDIMENTARY AND SINCE THEY LACK A RICH
11000 CONCEPTUAL STRUCTURE INTO WHICH THE PATTERN ABSTRACTED FROM THE INPUT
11100 CAN BE MATCHED FOR FURTHER INFERENCING. IN OUR EXPERIENCE THE
11200 MAN-MACHINE CONVERSATIONS SOON BECAME IMPOVERISHED AND BORING.
11300 WINOGRAD`S PROGRAM ,WHILE LIMITED TO A FEW OBJECTS AND RELATIONS IN A
11400 TOY ROBOTIC WORLD,REPRESENTED A GREAT IMPROVEMENT IN THE
11500 FEATURE-RECOGNITION APPROACH. HOWEVER MANY OF HIS FEATURES,SUCH AS
11600 DETERMINERS AND NOUN GROUPS, WERE GRAMMATICALLY RATHER THAN
11700 CONCEPTUALLY ORIENTED. ANOTHER FEATURE-RECOGNITUION APPROACH IS THAT
11800 OF WILKS[ ] WORKING IN THE AREA OF MACHINE TRANSLATION. HIS ALGORITHM
11900 CONSTRUCTS A PATTERN FROM ENGLISH TEXT INPUT WHICH IS MATCHED AGAINST
12000 TEMPLATES IN AN INTERLINGUAL DATA BASE FROM WHICH,IN TURN, FRENCH
12100 OUTPUT IS GENERATED WITHOUT USING A GENERATIVE GRAMMAR.
12200
12300 IN THE COURSE OF CONSTRUCTING A COMPUTER SIMULATION OF PARANOIA WE
12400 WERE FACED WITH THE PROBLEM OF DEALING WITH NATURAL LANGUAGE AS IT IS
12500 USED IN THE DOCTOR-PATIENT SITUATION OF A PSYCHIATRIC INTERVIEW.THIS
12600 DOMAIN OF DISCOURSE ADMITTEDLY CONTAINS MANY STEREOTYPES (`WHAT
12700 BROUGHT YOU TO THE HOSPITAL?') AND IS CONSTRAINED IN TOPICS (NEWTON`S
12800 LAWS ARE RARELY DISCUSSED). BUT IT IS RICH ENOUGH IN VERBAL BEHAVIOR
12900 TO BE A CHALLENGE TO A LANGUAGE UNDERSTANDING ALGORITHM SINCE A GREAT
13000 VARIETY OF HUMAN EXPERIENCES ARE DISCUSSED IN THIS DOMAIN INCLUDING
13100 THE RELATION WHICH DEVELOPS BETWEEN THE INTERVIEW PARTICIPANTS. THE
13200 JUDGEMENT OF 'PARANOIA' IS MADE BY PSYCHIATRISTS RELYING MAINLY ON
13300 THE VERBAL BEHAVIOR OF THE INTERVIEWED PATIENT. IF A PARANOID MODEL
13400 IS TO EXHIBIT PARANOID BEHAVIOR IN A PSYCHIATRIC INTERVIEW, IT MUST
13500 BE CAPABLE OF HANDLING DIALOGUES TYPICAL OF THE DOCTOR-PATIENT
13600 CONTEXT. SINCE THE MODEL CAN COMMUNICATE ONLY THROUGH TELETYPED
13700 MESSAGES,THE VIS-A-VIS ASPECTS OF THE USUAL PSYCHIATRIC INTERVIEW ARE
13800 ABSENT. THUS THE MODEL SHOULD BE ABLE TO DEAL WITH TYPEWRITTEN
13900 NATURAL LANGUAGE INPUT AND TO OUTPUT REPLIES WHICH ARE INDICATIVE OF
14000 AN UNDERLYING PARANOID THOUGHT PROCESS.
14100
14200 IN A PSYCHIATRIC INTERVIEW THERE IS ALWAYS A WHO SAYING SOMETHING TO
14300 A WHOM WITH DEFINITE INTENTIONS AND EXPECTATIONS. THERE ARE TWO
14400 SITUATIONS TO BE TAKEN INTO ACCOUNT, THE ONE BEING TALKED ABOUT AND
14500 THE ONE THE PARTICIPANTS ARE IN. SOMETIMES THE LATTER BECOMES THE
14600 FORMER. AS WEIZENBAUM [ ] HAS EMPHASIZED FOR COMPUTER SCIENTISTS,
14700 PARTICIPANTS IN DIALOGUES HAVE PURPOSES AND MACHINES MUST RECOGNIZE
14800 THIS FACT. THE DOCTOR'S PURPOSE IS TO GATHER CERTAIN KINDS OF
14900 INFORMATION WHILE THE PATIENT'S PURPOSE IS TO GIVE INFORMATION AND
15000 GET HELP. A JOB IS TO BE DONE; IT IS NOT SMALL TALK. OUR WORKING
15100 HYPOTHESIS IS THAT EACH PARTICIPANT IN THE DIALOGUE UNDERSTANDS THE
15200 OTHER BY MATCHING SELECTED PERSONALLY- SIGNIFICANT FEATURES IN THE
15300 INPUT AGAINST CONCEPTUAL PATTERNS WHICH CONTAIN INFORMATION ABOUT THE
15400 SITUATION OR EVENT BEING DESCRIBED LINGUISTICALLY. THIS
15500 UNDERSTANDING IS COMMUNICATED RECIPROCALLY BY LINGUISTIC RESPONSES
15600 JUDGED APPROPRIATE TO THE INTENTIONS AND EXPECTATIONS OF THE
15700 PARTICIPANTS AND TO THE REQUIREMENTS OF THE SITUATION. IN THIS PAPER
15800 WE SHALL DESCRIBE ONLY THE CONTEXT-SENSITIVE FEATURE-RECOGNITION
15900 PROCESSES USED TO EXTRACT A PATTERN FROM NATURAL LANGUAGE INPUT.IN A
16000 LATER COMMUNICATION WE SHALL DESCRIBE THE INFERENTIAL PROCESSES
16100 CARRIED OUT AT THE CONCEPTUAL LEVEL ONCE THE `PARADIGMATIC' PATTERN
16200 HAS BEEN RECEIVED FROM THE FEATURE-RECOGNITION PROCESSES.
16300
16400
16500 (HANS WRITES DESCRIPTION OF HIS FEATURE RECOGNIZER)